BLAKE and 256-bit advanced vector extensions

نویسندگان

  • Samuel Neves
  • Jean-Philippe Aumasson
چکیده

Intel recently documented its AVX2 instruction set extension that introduces support for 256-bit wide single-instruction multiple-data (SIMD) integer arithmetic over double (32-bit) and quad (64-bit) words. This will enable Intel’s future processors—starting with the Haswell architecture, to be released in 2013—to fully support 4-way SIMD com­ putation of 64-bit ARX algorithms (32-bit is already supported since SSE2). AVX2 also introduces instructions with potential to speed-up cryptographic functions, like any-to-any permute and vectorized table lookup. In this paper we show how the AVX2 instructions will benefit the SHA-3 finalist hash function BLAKE, an algorithm that naturally lends itself to 4-way 32or 64-bit SIMD implementations thanks to its inherent parallelism. We also wrote BLAKE-256 assembly code for AVX and AVX2, and measured for the former a speed of 7.62 cycles per byte, setting a new speed record.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing BLAKE with AVX, AVX2, and XOP

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit singleinstruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-a...

متن کامل

Enhancing the Matrix Transpose Operation Using Intel Avx Instruction Set Extension

General-purpose microprocessors are augmented with short-vector instruction extensions in order to simultaneously process more than one data element using the same operation. This type of parallelism is known as data-parallel processing. Many scientific, engineering, and signal processing applications can be formulated as matrix operations. Therefore, accelerating these kernel operations on mic...

متن کامل

A j-lanes tree hashing mode and j-lanes SHA-256

j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3 Generation Intel Core. For message ranging 2KB to 132KB...

متن کامل

Practical Near-Collisions for Reduced Round Blake, Fugue, Hamsi and JH

A hash function is near-collision resistant, if it is hard to find two messages with hash values that differ in only a small number of bits. In this study, we use hill climbing methods to evaluate the nearcollision resistance of some of the round SHA-3 candidates. We practi­ cally obtained (i) 184/256-bit near-collision for the 2-round compression function of Blake-32; (ii) 192/256-bit near-col...

متن کامل

Speeding-up Document Scoring with Tree Ensembles using CPU SIMD Extensions

Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems. This extended abstract shortly summarizes the work in [4] proposing V-QuickScorer (vQS), an algorithm which exploits SIMD vector extensions on modern CPUs to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012